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■ Abstract 

The oracle identification problem (OIP) was introduced by Ambainis et al. 0|. It is given as a 
t-h . set S of M oracles and a blackbox oracle /. Our task is to figure out which oracle in S is equal to 

the blackbox / by making queries to /. OIP includes several problems such as the Grover Search as 
special cases. In this paper, we improve the algorithms in 0] by providing a mostly optimal upper 
bound of query complexity for this problem: («) For any oracle set S such that \S\ < 2 N (d < 1), 
we design an algorithm whose query complexity is 0(^/N log M / logiV), matching the lower bound 
proved in (n) Our algorithm also works for the range between 2 N and 2 N l lo % N (where the 
bound becomes O(Nj), but the gap between the upper and lower bounds worsens gradually. (Hi) 
, Our algorithm is robust, namely, it exhibits the same performance (up to a constant factor) against 

the noisy oracles as also shown in the literatures 1121 12T] for special cases of OIP. 

keywords: quantum computing, query complexity and algorithmic learning theory 
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1 Introduction 

=s 

We study the following problem, called the Oracle Identification Problem (OIP): Given a hidden 
iV-bit vector / = (oi, . . . , ajv) G {0,1}^, called an oracle, and a candidate set S C {0,1}^, OIP 
^ . requires us to find which oracle in S is equal to /. OIP has been especially popular since the 
emergence of quantum computation, e.g., [7J |SJ OH El OH EH- ^or example, suppose that we set 



S = {(a±, . . . , ajv)| exactly one = 1}. Then this OIP is essentially the same as Grover search |2[ 
In [1], Ambainis et al. extended the problem to a general S. They proved that the total cost of any OIP 
with \S\ = N is 0(y/N), which is optimal within a constant factor since this includes the Grover search 
as a special case and for the latter an £l(y/~N) lower bound is known (e.g., (Hj). For a larger S, they 
obtain nontrivial upper and lower bounds, 0(yJN log M log N log log M) and ( N log M/ log N) , 
respectively, but unfortunately, there is a fairly large gap between them. 

Our Result. Let M = \S\. (i) If M < 2 N for a constant d (< 1), then the cost of our new 
algorithm is 0(y / N log M/ log N) which matches the lower bound obtained in [2J. (Previously we 
have an optimal upper bound only for M = N). (ii) For the range between 2 Nd and 2 N / lo * N , our 
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algorithm works without any modification and the (gradually growing) gap to the lower bound is 
at most a factor of 0(^/\og N log log N). (iii) Our algorithm is robust, namely, it exhibits the same 
performance (up to a constant factor) against the noisy oracles as shown in the literatures [2J 1121 121j 
for special cases of OIP. 

Our algorithms use two operations: (i) The first one is a simple query {S-query) to the hidden 
oracle, i.e., to obtain the value (0 or 1) of a.j by specifying the logiV-bit index i. The cost for this 
query is one per each, (ii) The second one is called a G-query to the oracle: By specifying a set 
T = {ii, . . . ,i r } of indices, we can obtain, if any, an index ij £ T s.t. aj. = 1 and nill otherwise. 
If there are two or more such ij's then one of them is chosen at random. The cost for this query is 
0(y/\T\/K) where K = ij e T and a {j = 1}| + 1. This query is stochastic, i.e., the answer is 
correct with a constant probability. Obviously our goal is to minimize the cost for solving the OIP 
with a constant success probability. Note that we incur the cost for only S- and G-queries (i.e., the 
cost for any other computation is zero), and it turns out that our query model is equivalent to the 
standard query complexity one, e.g., 

S-queries are standard and may not need any explanation. G-queries are, as one can see, the Grover 
Search themselves. So, they cannot be implemented in the framework of classical computation, and 
hence our paper is definitely a quantum paper. However, if we use the two queries as blackbox 
subroutines and follow the above complexity measure, then our algorithm design will be completely 
classical. Now it is important to observe the "efficiency" of G-queries. Since its cost is sublinear in 
|T|, our general idea is that it is more cost-effective to use them for a larger T. For example, the cost 
for a single G-query for \T\ = L is less than the total cost of three G-queries for \T'\ = L/3. However, 
it is also true that the former is less informative since it gives us only one bit-position in T which has 
value one, while the latter gives us three. Thus, as one would expect, selecting the size of T is a key 
issue when using G-queries. 

As mentioned earlier, if we use the two queries as blackbox subroutines together with their cost 
rule, then any knowledge about quantum computation is not needed in the design and analysis of our 
algorithms. Since S is a set of M 0/1-vectors of length N, it is naturally given as a 0/1 matrix Z 
of N columns and M rows. For a given Z, our basic strategy is quite simple: if there is a column 
which includes a balanced number of 0's and l's, then we ask the value of the oracle at that position 
by using an S-query. This reduces the number of candidates by a constant factor. Otherwise, i.e., 
if every column has, say, a small fraction of l's, then S-queries may seldom reduce the candidates. 
In such a situation, the idea is that it is better to use a G-query by selecting a certain number of 
columns in T than repeating S-queries. In order to optimize this strategy, our new algorithm controls 
the size of T very carefully. This contrasts with the previous method [H that uses G-queries always 
withT = {1,...,N} 

Previous Work. Suppose that we wish to solve some problem over input data of N bits. Pre- 
sumably, we need all the values of these N bits to obtain a correct answer, which in turn requires N 
(simple) queries to the data. In a certain situation, we do not need all the values, which allows us 
to design a variety of sublinear-time (classical) algorithms, e.g., |13 | I19[ 123], This is also true when 
the input is given with some premise, for which giving a candidate set as in this paper is the most 
general method. Quickly approaching to the hidden data using the premise information is the basis of 
algorithmic learning theory. In fact, Atici et al. in jS] independently use techniques similar to ours in 
the context of quantum learning theory. One of their results, which states the existence of a quantum 
algorithm for learning a concept class S whose parameter is 75 with 0(log |5*| log log I^I/^/ts) queries, 
almost establishes a conjecture of 0(log {Sl/y/^ys) queries in [2*2*] . 

Recall that our complexity measure is the (quantum) query complexity, which has been intensively 
studied as a central issue of quantum computation. The most remarkable result is due to Grover |2*U] . 
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which provided a number of applications and extensions, e.g., [HHI3E]- Recently quite many results 
on efficient quantum algorithms are shown by "sophisticated" ways of using the Grover Search. (Our 
present paper is also in this category.) Brassard et al. ^Uj showed a quantum counting algorithm that 
gives an approximate counting method by combining the Grover Search with the quantum Fourier 
transformation. Quantum algorithms for the claw- finding and the element distinctness problems given 
by Buhrman et al. also exploited classical random and sorting methods with the Grover Search. 
(Ambainis [3] developed an optimal quantum algorithm with 0(A 2//3 ) queries for element distinctness 
problem, which makes use of the quantum walk and matches to the lower bounds shown by Shi |25j.) 
Aaronson et al. constructed quantum search algorithms for spatial regions by combining the Grover 
Search with the divided-and-conquer method. Magniez et al. 24 showed efficient quantum algorithms 
to find a triangle in a given graph by using combinatorial techniques with the Grover Search. Diirr et 
al. |16j also investigated quantum query complexity of several graph-theoretic problems. In particular, 
they exploited the Grover Search on some data structures of graphs for their upper bounds. 

Recently, two papers, by H0yer et al. [U and Buhrman et al. [12], raised the question of how to 
cope with "imperfect" oracles for the quantum case using the following model: The oracle returns, 
for the query to bit Oj, a quantum pure state from which we can measure the correct value of 
with a constant probability. This noise model naturally fits the motivation that a similar mechanism 
should apply when we use bounded-error quantum subroutines. In j^J H0yer et al. gave a quantum 
algorithm that robustly computes the Grover's problem with 0(\^N) queries, which is only a constant 
factor worse than the noiseless case. Buhrman et al. ^2] also gave a robust quantum algorithm to 
output all the N bits by using O(N) queries. This obviously implies that O(N) queries are enough 
to compute the parity of the N bits, which contrasts with the classical f2(Alog N) lower bound given 
in [T^]. Thus, robust quantum computation does not need a serious overhead at least for several 
important problems, including the OIP discussed in this paper. 

2 S-queries, G-queries and Robustness 

Recall that an instance of OIP is given as a set S = {/i, . . . , /m} of oracles, each fi = (/j(l), . . . , fi(N)) £ 
{0,1}^, and a hidden oracle / £ S which is not known in advance. We are asked to find the index i 
such that f = fi- We can access the hidden oracle / through a unitary transformation Ut, which is 
referred to as an oracle call, such that 

U f \x)\0) = \x)\f(x)), 

where 1 < x < N denotes the bit-position of / whose value (0 or 1) we wish to know. This bit-position 
might be a superposition of two or more bit-positions, i.e., ^ oti \xi). Then the result of the oracle 
call is also a superposition, i.e., Yli a i \ x i) \f( x i))- The query complexity counts the number of oracle 
calls being necessary to obtain a correct answer i with a constant probability. 

In this paper we will not use oracle calls directly but through two subroutines, S-queries and G- 
queries. (Both can be viewed as classical subroutines when used.) An S-query, SQ(i), is simply a 
single oracle call with the index i plus observation. It returns f(i) with probability one and its query 
complexity is obviously one. A G-query, GQ(T), where T C {1, . . . , N}, returns 1 < i < N such that 
i £ T and f(i) = 1 if such i exists and nill otherwise. We admit an error, namely, the answer may be 
incorrect but should be correct with a constant probability, say, 2/3. Although details are omitted, it 
is easy to see that GQ(T) can be implemented by applying Grover Search only to the selected positions 
T. Its query complexity is given by the following lemma. 

Lemma 1 ([10J) GQ(T) needs 0{^\T\/K) oracle calls, where K = \{j\ j G T and f(j) = 1}| + 1. 
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If / is a noisy oracle, then its unitary transformation is given as follows 

Uf \X) |0> |0) =y/te\x) \<f> t ) \f(x)} + V / l- Px \x) |^) hf(x)) , 

where 2/3 < p x < 1, 1^) and (the states of working registers) may depend on x. As before \x) 
(and hence the result also) may be a superposition of bit-positions. Since an oracle call itself includes 
an error, an S-query should also be stochastic. SQ(i) returns f(i) with probability at least 2/3 (and 
~if(i) with at most 1/3). G-queries, GQ(T), are already stochastic, i.e., succeed to find an answer 
with probability at least 2/3 if there exists one, and they do not need modification. 

Lemma 2 (|21j) Let K and T be as before. Then GQ{T) needs 0{^J\T\/K) noisy oracle calls. 

In this paper our oracle mode is almost always noisy. Therefore we simply use the notation SQ 
and GQ instead of SQ and GQ. 

3 Algorithms for Small Candidate Sets 
3.1 Overview of the Algorithm 

Recall that the candidate set S (|S| = M) is given as an M x N matrix Z. Before we give our main 
result in the next section, we discuss the case that Z is small, i.e., M = poly(iV) in this section, which 
we need in the main algorithm and also will be nice to understand the basic idea. Since our goal is 
to find a single row from the M ones, a natural strategy is to reduce the number of candidate rows 
(a subset of rows denoted by S) step by step. This can be done easily if there is a column, say, j 
which is "balanced," i.e., which has an approximately equal number of O's and l's in Z(S), where 
Z(S) denotes the matrix obtained from Z by deleting all rows not in S. Then by asking the value of 
f(j) by an SQ(j), we can reduce the size of S (i.e., the number of oracle candidates) by a constant 
factor. Suppose otherwise, that there are no such good columns in Z(S). Then we gather a certain 
number of columns such that the set T of these columns is "balanced," namely, such that the number 
of rows which has 1 somewhere in T is a constant fraction of |S|. (See Fig. 1 where the columns in T 
are shifted to the left.) Now we execute GQ(T) and we can reduce the size of S by a constant fraction 
according to whether GQ(T) returns nill (S is reduced to 52 in Fig. 1) or not (S is reduced to Si in 
Fig. 1). Then we move to the next iteration until |S| becomes one. 

The merit of using GQ(T) is obvious since it needs at most 0{^J\T\) queries while we may need 
roughly \T\ queries if asking each position by S-queries. Even so, if \T\ is too large, we cannot tolerate 
the cost for GQ(T). So, the key issue here is to set a carefully chosen upper bound for the size 
of T. If we can select T within this upper bound, then we are happy. Otherwise, we just give up 
constructing T and use another strategy which takes advantage of the sparseness of the current matrix 
Z(S). (Obviously Z(S) is sparse since we could not select a T of small size.) 

It should be also noted that in each iteration the matrix Z(S) should be one-sensitive, namely the 
number of l's is less than or equal to the number of O's in every column. (The reason is obvious since 
it does not make sense to try to find 1 if almost all entries are 1.) For this purpose we implicitly apply 
the column-flipping procedure in each iteration. Suppose that some column, say j, of Z(S) has more 
l's than O's. Then this procedure "flips" the value of f(j) by adding an extra circuit to the oracle 
(but without any oracle call). Let this oracle be /(j) and Z(S(j)) be the matrix obtained by flipping 
the column j of Z(S). Then obviously / £ S iff the matrix Z(S(j)) contains the row f(j), i.e., the 
problem does not change essentially. Note that the column-flipping is the same as that in [I], where 
the OIP matrix was written as a iV x M (number of columns x number of rows) 0-1 matrix instead 
of the more common M x N one. 
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Figure 1: Reducing the candidate set by 
G-queries GQ(T) on the column set T 



Figure 2: Constructing the column set T by 
RowCover(S, r) 



3.2 Procedure RowReduction(T, I) for Reducing Oracles Candidates 

This procedure narrows S in each iteration, where T is a set of columns and I is an integer > 1 
necessary for error control. See Procedure 1 for its pseudocode. Case 1 : If / has one or more l's in T 
like fi in Fig. 1, then k = GQ(T) gives us one of the positions of these l's, say the circled one in the 
figure. The procedure returns with the set S[ of rows in the figure, i.e., the rows having a 1 in the 
position selected by the GQ(T). Case 2 : If / has no l's in T like fi in the figure, then k = nill (i.e., 
GQ(T) correctly answered). Even if k ^ nill (GQ(T) failed) then Majority ( k, I, /), i.e., the majority 
of 601 samples of f(k), is with high probability regardless of the value of k. Therefore the procedure 
returns with the set S*2 of rows, i.e., the rows having no l's in T. The parameter I guarantees the 
success probability of this procedure as follows. 

Lemma 3 The success probability and the number of oracle calls in RowReduction(T, I) are 1 — 0(7/3') 
and l{0{y/\T\) + I), respectively. 

Proof. In each repetition, we need 0{^/\T\) oracle calls for the G-queries and 0(1) calls (S-queries) 
for Majority(fc, I, f). Thus the total number of calls is l(0(y/\T\) + I). For the success probability, let 
us first consider Case 1 above. Since the G-queries are repeated up to I times, the probability that 
all tries fail (i.e., the next Majority = 0) is 1/3'. When it succeeds, the following Majority fails with 
probability 1/3' also (Here, the number of samples (= 60Z) for majority is set appropriately so that 
the error probability is at most 1/3' by the Chernoff bound). Hence the total failure probability is 
at most 0(1/3'). In Case 2, since Majority fails with probability 0(1/3') in each iteration, the total 
probability of failure is at most 0(l/3 l ). □ 



3.3 Procedure RowCover(S', r) for Collecting Position of Queries 

As mentioned in Sec. 3.1, we need to make a set T of columns being balanced as a whole. This 
procedure is used for this purpose where Z(S) is the current matrix and < r < 1 controls the size 
of T. See Procedure 2 for its pseudocode. As shown in Fig. 2, the procedure adds columns ti,t2, • • • , 
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to T as long as a new addition t{ increases the number of covered rows (= |PositiveRow(T, Z)\) by a 
factor of r or until the number of covered rows becomes \S\/4. We say that RowCover succeeds if it 

31 S\ 

finishes with S' such that \S'\ < -j^ and fails otherwise. Suppose that we choose a smaller r. Then 
this guarantees that the resulting Z{S) when RowCover fails is more sparse, which is desirable for 
us as described later. However since |T| < 1/r, a smaller r means a larger T when the procedure 
succeeds, which costs more for G-queries in RowReduction. Thus, we should choose the minimum r 
such that the query complexity for the case that RowCover keeps succeeding as long as the total cost 
does not exceed the total limit (= 0(y/N)). 

3.4 Analysis of the Whole Algorithm 

Now we are ready to prove our first theorem: 

Theorem 1 The Mx N OIP can be solved with a constant success probability by querying the blackbox 
oracle 0(y/N) times if M = poly(N). 

Proof. See Procedure 5 for the pseudocode of the algorithm ROIPS (S*, Z) (Robust OIP algorithm 
for Small Z). We call this procedure with S = {1, . . . , M} (we need this parameter since ROIPS is 
also used in the later algorithm) and the given matrix Z. As described in Sec. 3.1, we narrow the 
candidate set S at lines 2 and 3. If RowCover at line 2 succeeds, then \S\ is sufficiently reduced. Even 
if RowCover fails, |S| is also reduced similarly if RowReduction at line 3 can find a 1 by G-queries. 
Otherwise line 7 is executed where the current oracle looks like fi in Fig. 1. In this case, by finding 
a 1 in the positions {1, . . . ,N} \ T by the G-query at line 7, \S\ is reduced to \S\ log 4 N/N, because 
we set r = log 4 N/N at line 2. Since the original size of S is N c for a constant c, line 7 is executed at 
most c + 1 times. 

Note that the selection of the value of r at line 2 follows the rule described in Sec. 3.3: Since 
r = log 4 N/N, the size of T at line 3 is at most iV/log 4 N. This implies that the number of oracle 
calls at line 3 is 0(logiV ■ \HS! / 'log 2 N) = 0{y/N / log N). Since line 3 is repeated at most O(logiV) 
times, the total number of oracle calls at line 3 is at most 0(y/~N). Line 7 needs 0{\/~N) oracle calls, 
but the number of its repetitions is 0(1) as mentioned above. Thus the total number of oracle calls 
is 0{y/N). 

Also by Lemma 1, the error probability of line 3 is at most 0(logN/N). Since the number of 
repetitions is O(logTV), this error probability is obviously small enough. The error probability of line 
7 is constant but again this is not harmful since it is repeated only O(l) times, and thus the error 
probability can be made as small as it is needed at constant cost. □ 

4 Algorithms for Large Candidate Sets 
4.1 Overview of the Algorithm 

In this section, our MxN input matrix Z is large, i.e., M is superpolynomial. We first observe how the 
previous algorithm, ROIPS, would work for such a large Z. Due to the rule given in Sec. 3.3, the value 
of r at line 2 should be (5 = log M (log log M) 2 log N/(2N). The calculation is not hard: Since we need 
log M repetitions for the main loop, we should assign roughly log log M to I of RowReduction for a 
sufficiently small error in each round. Then the cost of RowReduction will be y/ 1/(3 -log log M. Further- 
more, we have to multiply the number of repetitions by log M factor, which gives us a/ N log M/ log N, 
the desired complexity. Thus it would be nice if RowCover keeps succeeding. However, once RowCover 
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fails, each column can still include as many as M(5 l's which obviously needs too many repetitions of 
RowReduction at line 7 of ROIPS. 

Recall that the basic idea of ROIPS is to reduce the number of candidates in the candidate set S 
by halving (the first phase) while the matrix is dense and to use the more direct method (the second 
phase) after the matrix becomes sufficiently sparse. If the original matrix is large, this strategy fails 
because, as mentioned above, the matrix does not become sufficiently sparse after the first phase. Now 
our idea is to introduce an "intermediate" procedure which reduces the number of the candidates more 
efficiently than the first phase. For this purpose, we use RowReductionExpire_MTGS, which tries to 
find a position of "1" in the oracle with multi-target Grover Search (K > 1 in Lemma EJ) by assuming 
that the portion of such position, K/N, is sufficiently larger than 1/(3. If the assumption is indeed 
true then we apply RowReduction as before and moreover the number of G-queries in the main loop 
of RowReduction is repeated for a constant time of wN/K on average. 

However, it is of course possible that the actual number of repetitions is far different from the 
expected value. That is why we limit the maximum number of oracle calls spent in G-queries by 
MAX_QUERIES(iV, M), a properly adjusted number which depends on the size of the OIP matrix, 
and will be referred in the hereafter without its arguments for simplicity. If the value of COUNT gets 
this value, then the procedure expires (just stops) with no answer, but this probability is negligibly 
small by selecting MAX_QUERIES appropriately. Notice also that because of the failure of phase 1, 
it is guaranteed that the number of l's in each column is "fairly" small, which in turn guarantees that 
the degree of row reduction is satisfactory for us. See Procedure 8 for our new algorithm ROIPL. 

Finally, when the assumption is false, RowReductionExpire_MTGS finishes after log log(log Mj log N) 
iterations of its main loop. In this case, we can prove that the matrix of the remaining candidates is 
very sparse and the number of its rows decreases exponentially by a single execution of RowReduc- 
tionExpire_MTGS. Thus one can achieve our upper bound also (details are given in the next section). 

4.2 Justification of the Algorithm 

One can see that in ROIPL, oracle calls take place only at lines 6 and 11. As described in the previous 
overview, the total number of oracle calls in RowReduction at line 6 is 0(y N log M / log N), and the 
whole execution of this part succesfully ends up with high probability. For the cost of line 11, we can 
prove the following lemma. 

Lemma 4 The main loop (line 4 to 13) of ROIPL finishes with high probability before the value of 
COUNT reaches MAX_Q UERIESfN, M ). 

Proof. Note that there are two types of oracle calls in RowReductionExpire_MTGS at lines 11. The 
first type, Type A, is when portion of "1" in the hidden oracle is at least l/4(log |5|/(iVlogiV)), and 
the other type, Type B, is when the portion of "1" is smaller. Let W = Wa + Wb be the expected 
number of oracle calls, where Wa is the expected number of Type A calls and Wb, that of Type B 
calls. It is enough to prove that W A < §MAX_QUERIES and W B < ±MAX_QUERIES. We defer 
the rigorous proofs in the Appendix and give instead the following more simple averaging argument 
on the bounds of Wa and Wb- 

We first prove that W A < §MAX_QUERIES. First, note that RowReductionExpire_MTGS 
for Type A should require an 0(1) expected number of iterations of GQ, each of which requires 
0(y / 'N log N/ log \S\) queries. Now, since phase 1 has failed, the number of rows having a "1" at 
some position in T = {1..N} is at most P\S\. Thus, after the above 0(y / N log N/\og \S\) queries the 
number of candidates is reduced by a factor of (3 = (^) log ^ 1 / /3 ). Therefore, intuitively, to reduce the 
number of candidates by half, the number of queries spent in GQ(T) is Q( lo ,\,^ y/NlogN/log\S\). 
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Thus we have the following recurrence relation: 



1 



W A (\S\) < max(W A (l), W A (2), W A (3), • • • , W A (\S\/2)) + °( log(1//3) VNlogN/\og\S\), 

where Wa(|<S|) is the number of Type A queries to distinguish the candidate set S. Since ROIPL 
starts with \S\ = M and ends with |5| p» N w (note that /3\S\ > 2 if \S\ « N 10 ), the above recurrence 
relation resolves to the following: 



W A (M) < W A (M/2)+a y/N\ogN/\ogM 



VNlogN / 1 1 



Vlog 


M 


1 




Vlog 


M 



< VNlogN f 1 1 

" a log(l//3) Vv / lolM VlogM - 1 

VFbglV /— - ViVlogMlogiV 
log(l//3) log(l//3) 




where cr is a sufficiently large constant. Therefore, the total number of queries is 0(^/ N log M / log N) 
since log(l//3) = Q(logN) if M < 2 N<1 . Note that if the above averaging argument is correct then \S\ 
can be reduced into a constant by just repeating line 11. However, this is not exactly true for ROIPL 
since \S\ can only be reduced until becoming poly(iV) in order to obtain the desired number of query 
complexity (see the proof of Lemma 6 in Appendix). Fortunately, in this case we can resort to ROIPS 
for identifying the hidden oracle out of poly(N) candidates with just 0{y/N) queries as in line 16, and 
thus achieve a similar result with the averaging argument. 

For technical details of ROIPL, note that 1/3MAX_QUERIES is ten times the expected total 
number of queries supposing all queries are at line 11, i.e., the case with the biggest number of Type A 
queries. By Markov bound, the probability that the number of queries exceeds this amount is negligible 
(at most 1/10). We summarize the property of RowReductionExpire_MTGS in the following lemma 
which can be proven similarly as Lemma 03 

Lemma 5 The success probability and the number of oracle calls of the procedure 
RowReductionExpire-MTGS(T,l,COUNT,r) arel-0(l/3 l ) and l{0{yj\jr) + 1), respectively. More- 
over, if there are more than r fraction of 1 's in the current oracle, then the average number of queries 
is 0{ v / l/r~+l). 

We next prove that W B < |MAX_QUERIES. In this case, MultiTargetGQ fails and therefore the 
density of "1" at every row of the candidates is less than 7 = \ log |5|/(iVlogiV). Note that any two 
rows in S" (the new S at the left-hand side of line 11) must be different, i.e., we have to generate 
I S" I different rows by using at most 7./V l's for each row. Let W be the number of rows in S" which 
include at most 2^fN l's. Then \S"\ —W rows include at least 2^N l's, and hence the number of such 
rows must be at most | S\/2. Thus we have \S"\ — W < \S\/2 and it follows that 

A= [27 jV] 



\S"\<2W<2 £ ( N k ) 



The right-hand side is at most 2 • 2 NH ^/ N ^ (see e.g., |15j. page 33), which is then bounded by 2|S'| 1 / 2 
since H(x) ~ xlog(l/a;) for a small x. Thus, we have \S"\ < 2|S'| 1 / 2 . Hence, the number of candi- 
dates decreases doubly exponentially, which means we need only 0(log(log Mj log N)) iterations of 
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RowReductionExpire_MTGS to reduce the number of the candidates from M to N . Note that we 
let I = loglog(logM/log N) at line 11 and therefore the error probability of its single iteration is at 
most 0(l/log(logM/logiV)). Considering the number of iterations mentioned above, this is enough 
to claim that Wb < ^MAX_QUERIES (see Appendix for the proof in detail, where the actual bound 
of Wb is shown to be much smaller). □ 

Now here is our main theorem in this paper. 

Theorem 2 The M x N OIP can be solved with a constant success probability by querying the blackbox 
oracle 0(^N 1 ^^) times if poly {N) <M< 2 N " for some constant d (0 < d< I). 

Proof. The total number of oracle calls at line 6 is within the bound as described in Sec. 4.1 and the 
total number of oracle calls at line 11 is bounded by Lemma 4. As for the success probability, we have 
already proved that there is no problem for the total success probability of line 6 (Sec. 4.1) and lines 
11 (Lemma 4). Thus the theorem has been proved. □ 

4.3 OIP with o(N) queries 

Next, we consider the case when M > 2 Nd . Note that when M = 2 d ' N , for a constant d' < 1, 
the lower bound of the number of queries is Q(iV) instead of Q(*iJ N log M / log N) . Therefore, it is 
natural to expect that the number of queries exceeds our bound as M approaches 2^. Indeed, when 
2 N " < M < 2 7V / 1 °s Ar , the number of queries of ROIPL is bigger than 0( v / iVlogM/log N) but still 
better than O(N), as shown in the following theorem. 

Theorem 3 For2 N <M< 2 Ar / lo s Ar , theMxN OIP can be solved with a constant success probability 
by querying the blackbox oracle O ( ^^i^jy^" 6 M ) times for (3 = min( logM ^ l ° sl ^ M ^ \°g N ^ 

Proof. The algorithm is the same as ROIPL excepting the following: At line 1, we set (3 as before 
if M < 2 Ar / 1 °s 37V . Otherwise, i.e., if 2 N l Xo ^ N < M < 2 N / lo ^ N , we set (3 = 1/4. Then, we can use 
almost the same argument to prove the theorem, which may be omitted. □ 

Remark 1 Actually the query complexity of Theorem 3 changes smoothly from 0(^J N\ogM / log N) 
to 0(N/ log N) and to O(N) as M ch anges from 2 Nd to 2 N / lo ^ N and to 2 N / lo ^ N , respectively. When 
M = 2 N / lo & N , the lower bound n(- v /iVlogM/logJV) in b ecomes fi(iV/ log N). So it seems that 
our upper bound is worse than this lower bound by a factor of logiV. However, if M is this large, 
then we can improve the lower bound to £l(N/ y/log N log log N) and hence our upper bound is worse 
than the lower bound only by at most a factor of 0(^\og N log log N) in this range (see Appendix). 

5 Concluding Remarks 

As mentioned above, our upper bound becomes trivial 0(N) when M = 2 Ar / logAr , while for bigger 
M \12\ has already given a nice robust algorithm which can be used for OIP with O(N) queries. A 
challenging question is whether or not there exists an OIP algorithm whose upper bound is o(N) for 
M > 2 Ar / logAr , say, for M = 2 Ar / lo s lo s Ar . Even more challenging is to design an OIP algorithm which 
is optimal in the whole range of M. There are two possible scenarios: The one is that the lower bound 
becomes Cl(N) for some M = 2°^ . The other is that there is no such case, i.e., the bound is always 
o(N) if M = 2°( N \ At this moment, we do not have any conjecture about which scenario is more 
likely. 
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Procedure 1: RowReduction(T, /) 

Require: T C {1, . . . , N} and I € N 

1: for j : ^ 1 to I do 

2: k <— GQ(T) 

3: if Majority(£;,min(Z,logiV),/) = 1 then 

4: return PositiveRow({fc}, Z) 

5: end if 

6: end for 

7: return {1, . . . ,M} \ PositiveRow(T, Z) 



Procedure 2: RowCover(5, r) 
Require: S C {1, . . . , M} and < r < 1 
1: T<-{} 
2: S' <- 5 

3: while 3i s.t. |PositiveRow({i}, Z{S'))\ > r\S\ and |PositiveRow(T, Z(S))\ < \S\/A do 
4: T^TUji} 

5: S'*-S\ PositiveRow(T, Z(5)) 
6: end while 

7: return T //by one-sensitivity \PositiveRow(T, Z(S))\ < 3\S\/4 



Procedure 3: PositiveRow(T, Z) 
return {i\ j G T and Z(i,j) = 1} 



Procedure 4: Majority(/c, Z, /) 
return the majority of 60Z samples of f(k) if k ^ nill, else 0. 



Procedure 5: ROIPS(S, Z) 

1: repeat 

2: T <— RowCover(5, log 4 iV/JV) 

3: 5' <- S 1 n RowReduction(T, log N) 

4: if 1 5' | < then 

5: S^S' 

6: else 

7: 5 <- S' n RowReduction({l, . . . , N} \ T, 1) 
8: end if 
9: until \S\ < 1 
10: return S 
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Procedure 6: RowReductionExpire_MTGS(T, I, COUNT, r) 
the same as RowReduction(T, /) except that we add the folowing two: (i) the number of queries is added to 
COUNT and the empty set is returned when COUNT exceeds M AX -QUERIES (N, M) (defined in ROIPL) 
(ii) For r > 0: GQ(T) is replaced by MultiTargetGQ(T, r), a G-query on T assuming that there are more 
than r fraction of l's in the current oracle, and at line 7 the set of all rows that have at most r fraction of 
Ps is returned instead. 



Procedure 7: ROIPL (Z) 
Require: Z : M x N 0-1 matrix and poly(iV) < M < 2 N / lo ^ N 

1; ^ logMQoglogA^logiV . g = {1; _ ;M} 

2: MAX_QUERIES(N,M) <- 45cr VN ^y j^ //<?: a constant factor of Robust Quantum Search in ^ 
3: COUNT <— // Increased in RowReductionExpire 

4: repeat 

5: T<-RowCover(S,/?) 

6: S' <- S n RowReduction(T, log log M) 

7: if \S'\ < 3/4|5| then 

8: S <- 5' 

9: else 

10: 5 <- 5' 

11: 5 <- 5 n RowReductionExpire_MTGS({l . . . N}, log log (gf ), COUNT, | I ^gL J )) 

12: end if 

13: until |5| < iV 10 

14: Z' «- Z(5) 

15: relabel S 1 and Z' so that the answer to OIP of Z can be deduced from that of Z' 
16: return ROIPS(5, Z') 
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A Proof of Theorem 2 



Theorem 2 can be shown by proving Lemma 4, concluding that ROIPL succeeds to identify the 
blackbox oracle with constant probability using at most 0(y/N log M/ log N) queries. Here, we provide 
its detailed proof by showing the following lemmas. Notice that a is the constant factor in Lemma 2 
which can be computed from |21j . 



Lemma 6 With high probability, the total number of Type A queries at line 11 in the whole rounds 
of ROIPL does not exceed 2/3 • MAX_QUERIES(N,M) = 30<r VJV ff ( ffj^ . 

Lemma 7 With high probability, the total number of Type B queries at line 11 in the whole rounds 
of ROIPL is less than 1/3 • MAX_QUERIES(N,M) = 15a ^jgf/j p^- 

Now it is left to prove the above two lemmas. 

Proof of Lemma |HJ Before proving Lemma E3 we show the following: 

Lemma 8 RowReductionExpire-MTGS at line 11 of ROIPL is executed for at most m* = |~ log ^g^rg^ ~ 
times. 

Proof. FlowReductionExpire_MTGS at line 11 is executed when the first Row Reduction at line 6 can- 
not reduce 1/4 fraction of the rows. Thus, finding a position of "1" reduces the number of candidates 
by a (3 fraction. Thus, denoting the set of oracle candidates at round k as S^, \Sk\ is at most Mf3 k . 
Therefore, it follows that RowReductionExpire is executed for at most m* = [ log ^~^ og ^ ] times. □ 

Now, let us first bound the number of queries of Type A at RowReductionExpire_MTGS at line 
11. For this purpose, let and X be the random variables denoting the number of queries of the 
RowReductionExpire at round k and the total number of queries of the RowReductionExpire in the 
whole rounds, respectively. Clearly, since for each trial of GQ(T) the success probability is at least 
2/3, the average number of queries is: 



m oo 



E[X]=Y J E[Xk] < 



2 1 / N log N 



m—l 



k=0 



< 




m ■ 



log\S k \ 



-a ■ y/N log N lM°g M + k log (3 



k=0 



3 

= -a ■ a/ N log N 1/A/lOlog N + A; log (1//3) (reordering the summation) 
fc=o 

3 ViVlogiVlogM 
2 a ' log(l//5) • 

Note that the fifth inequality is obtained from bounding the sum of terms whose values are between 
J 2Ho S N and \/ 2*+°iogN > there are at most 2k lo S N l lo § l IP of them ' 
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When poly(JV) < M < 2 N \ logl//3 = D,(logN) and by Markov bound, Pi[X > t ■ E[X]] < 1/t, 
i.e., the probability that Stage 2 ends in failure is at most Pr[X > lOE'pr]] < 1/10. This proves the 
lemma. 

Proof of Lemma [7| Since Type B queries are considered, the portion of "1" in the ora- 
cle is less than l/A\S\/(NlogN). Therefore if RowReductionExpire_MTGS does not finish after 
log log (log Mj log N) repetitions, by Lemma |S] this case can be detected with probability at least 
1 — 0(1/ log(log Mj log N)). And fortunately, since \Sk\, the number of the candidate oracles at round 
k, is at most 2|S , / C _i| 1 / 2 , this case happens only log(log Mj log N) times in the whole course of the 
algorithm. Thus we have the following recurrence relation: 



w B (\s k \) < w B (\s k+1 \) + od^^), 

where Wsd^l) is the number of Type B queries to distiguish the candidate set 5&. This resolves to 

W B {\S \) < V trJ ° • log log log M log N) < 3<r\/iVloglog(logM/logJV), 
^ w log|6fe| 



k=0 



which is much smaller than 1/3 • MAX_QUERIES since log log x < ^fx for x > 1 and log(l//3) = 
n(logJV) for M < 2^ . As can be seen in the above inequality, the number of queries at the last rounds, 
namely, when \Sk\ = poly(N), is the dominant factor because \Sk\ decreases doubly exponentially. This 
concludes the proof. 



B Slightly Better Lower Bounds for OIP 

Here, we will show that for 2 Nd < M < 2 N / l ° sN ROIPL is only ^log N log log N worse than the 
query-optimal algorithm. The following theorem is by [1]. 

Theorem 4 There exists an OIP whose query complexity is ^(^N log Mj log N) . 

By a simple argument, indeed the above theorem can be restated more accurately as follows. 



Theorem 5 There exists an OIP whose query complexity is f2(y (N — k)(k + 1)) when the number 
of candidates M satisfies 

Proof. Similar to the proof in Theorem 2 in In fact, the proof of Theorem 2 of [1] already achieved 
the above lower bound but there fi(log Mj log N) is substituted for k which is not done here because 
the substitution can weaken the statement. □ 



Remark 2 A similar but weaker lower bound can be found in where it is shown that the lower 
bound for OIP with the number of candidates M is k such that k is the smallest integer satisfying 

m<eLo«0- 

Now, we can state the following lemma. 



14 



Lemma 9 For M < 2 N / lo ^ N , ROIPL is at most 0(\/\og N log log N) worse than the optimal algo- 
rithm. 

Proof. For M = 2 N / lo % N , we can take ~k/Nlog(N/k) = 1/logiV since (^ N ) = 2 (1 -°^ NH ^ (see, e.g., 
|15j . page 33) where here, H(x) ~ xlog(l/x) for a small x . By the previous theorem, there exists an 
OIP whose query complexity is $7(iV / \/log N log log N) while by Theorem 3 the query complexity of 
ROIPL is only 0(N). □ 
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